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1 Parity logging overcoming the small write problem in redundant 100% 
3l disk arrays 

Daniel Stodolsky , Garth Gibson , Mark Holland 

ACM SIGARCH Computer Architecture News , Proceedings of the 

20th annual international symposium on Computer architecture May 

1993 

Volume 21 Issue 2 

Parity encoded redundant disk arrays provide highly reliable, cost 
effective secondary storage with high performance for read 
accesses and large write accesses. Their performance on small 
writes, however, is much worse than mirrored disks—the 
traditional, highly reliable, but expensive organization for 
secondary storage. Unfortunately, small writes are a substantial 
portion of the I/O workload of many important, demanding 
applications such as on-line transaction processing. This paper ... 

2 Parity logging disk arrays 100% 
13 Daniel Stodolsky , Mark Holland , William V. Courtright , Garth A. 

Gibson 

ACM Transactions on Computer Systems (TOCS) August 1994 

Volume 12 Issue 3 

Parity-encoded redundant disk arrays provide highly reliable, 
cost-effective secondary storage with high performance for reads 
and large writes. Their performance on small writes, however, is 
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1 The architecture of a fault-tolerant cached RAID controller 90% 
@) Jai Menon , Jim Cortney 

ACM SIGARCH Computer Architecture News , Proceedings of the 20th 
annual international symposium on Computer architecture May 1993 
Volume 21 Issue 2 

RAID-5 arrays need 4 disk accesses to update a data 
block—2 to read old data and parity, and 2 to write new 
data and parity. Schemes previously proposed to improve the 
update performance of such arrays are the Log-Structured File 
System [10] and the Floating Parity Approach [6]. Here, we 
consider a third approach, called Fast Write, which eliminates disk 
time from the host response time to a write, by using a 
Non-Volatile Cache in the disk array controller. We examine three 
alternativ ... 

2 RAID: high-performance, reliable secondary storage 88% 
@) Peter M. Chen , Edward K. Lee , Garth A. Gibson , Randy H. Katz , 

David A. Patterson 

ACM Computing Surveys (CSUR) June 1994 

Volume 26 Issue 2 

Disk arrays were proposed in the 1980s as a way to use parallelism 
between multiple disks to improve aggregate I/O performance. 
Today they appear in the product lines of most major computer 
manufacturers. This article gives a comprehensive overview of disk 
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1 A simulation model of GECOS III 99% 
Si Kenneth E. Norland , William C. Bulgren 

Proceedings of the 1971 annual conference January 1971 

A simulation model for a multiprogramming operating system has 
been devised and programmed in Simscript. Essential elements of 
the environment have been included such as job arrival rate, 
maximum number of jobs, the operating system overhead and 
peripheral and core allocation. Some allowances are made for 
time-sharing, as well as remote and normal batch jobs. The model is 
patterned basically after GECOS III, on the H-600 line computer. 
The hardware constraints considered when necessary are ... 

2 The architecture of a fault-tolerant cached RAID controller 99% 
13 Jai Menon , Jim Cortney 

ACM SIGARCH Computer Architecture News , Proceedings of the 20th 
annual international symposium on Computer architecture May 1993 
Volume 21 Issue 2 

RAID-5 arrays need 4 disk accesses to update a data 
block—2 to read old data and parity, and 2 to write new data 
and parity. Schemes previously proposed to improve the update 
performance of such arrays are the Log -Structured File System [10] 
and the Floating Parity Approach [6]. Here, we consider a third 
approach, called Fast Write, which eliminates disk time from the 
host response time to a write, by using a Non-Volatile Cache in the 
disk array controller. We examine three alternativ ... 
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3 Striping in disk array RM2 enabling the tolerance of double disk 99%' 
13 failures 

Chan-Ik Park , Tae-Young Choe 

Proceedings of the 1996 ACM/IEEE conference on Supercomputing 

(CDROM) November 1996 

There is a growing demand in high reliability beyond what current 
RAID can provide and there are various levels of user demand for 
data reliability. An efficient data placement scheme called RM2 has 
been proposed in [11], which makes a disk array system tolerable 
against double disk failures. In this paper, we consider how to 
choose an optimal striping unit for RM2 particularly when no 
workload information is available except read/write ratio. A disk 
array simulator for RM2 has been developed fo ... 

4 RAID: high-performance, reliable secondary storage 98% 
@| Peter M. Chen , Edward K. Lee , Garth A. Gibson , Randy H. Katz , 

David A. Patterson 

ACM Computing Surveys (CSUR) June 1994 

Volume 26 Issue 2 

Disk arrays were proposed in the 1980s as a way to use parallelism 
between multiple disks to improve aggregate I/O performance. 
Today they appear in the product lines of most major computer 
manufacturers. This article gives a comprehensive overview of disk 
arrays and provides a framework in which to organize current and 
future work. First, the article introduces disk technology and reviews 
the driving forces that have popularized disk arrays: performance 
and reliability. It discusses the tw ... 
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Oct 28, 1997 



DOCUMENT- IDENTIFIER: US 5682396 A 

TITLE: Control unit in storage subsystem with improvement of redundant data updating 



Detailed Description Text (21) : 

FIG. 22 shows the construction of the PG management information 2001. An empty 
pointer 2206 is used for linking empty management information 2202 with each other. 
An update before segment pointer 2200 indicates a segment 1800 in which the update 
before content of a record 1502 corresponding to the entry is stored. An update 
after segment pointer 22 01 indicates a segment in which the update after value of a 
record 1502 corresponding to the entry is stored. In the case where both the update 
before segment pointer 2200 and the update after segment pointer 2201 take null 
values, it is meant that the corresponding record 1502 is not stored in the cache 
1308. A write after bit 2202 is information indicating that a write after process 
1313 for a record 1502 corresponding to the entry should be performed. A load 
request bit 2203 is information indicating that a record 1502 corresponding to the 
entry should be loaded into the cache 1308. Since the update before segment pointer 
2200, the update after segment pointer 2201, the write after bit 2202 and the load 
request bit 2203 are provided corresponding to each record 1502, the PG management 
information 2001 includes each of those data which is (m+n) in number equal to the 
number of records 1502 included in the corresponding parity group 1600. Lock 
information 2204 indicates that the records 1502 in the parity group 1600 
corresponding to the PG management information 2001 under consideration are being 
operated. In a write action for the disk array using the data distribution by 
record, not only a data record 1501 but also all parity records 1501 are updated. 
Therefore, it is required that a write action for the same parity group 1600 is 
sequentially performed (or serialized) in accordance with the lock information 2204. 
Lock wait information 22 05 is information indicating that a read/write request from 
the processor 1300 is in a wait condition. The lock wait information 2205 is 
provided for ensuring that the write action will be performed sequentially. A parity 
generation bit 2206 is information indicating that records 1502 necessary for 
generation of updated values of parity records 1501 belonging to the parity group 
1600 corresponding to the PG management information 2001 under consideration are 
stored in the cache 1308. 

Detailed Description Text (113) : 

From the foregoing, in step 3301 shown in FIG. 33, the update before data record 105 
is held in the cache 13 08 as it is and the write data (corresponding to the update 
after data record b 6401) accepted in the segment 1800 indicated by the update after 
segment pointer 2201 is stored into the segment 1800 (corresponding to the update 
before data record a 63 02) in which the update after data record 106 has been 
stored. In step 3302, a load request bit 2203 corresponding to the update before 
parity record 107 which has not been loaded in the cache 1308, is turned on. In step 
3303, lock information 2204 and disk unit occupy information 2004 are reset. 
Thereafter, in step 3304, the control unit 1305 reports the completion of the 
process to the processor 1300. 
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File: USPT 



Jul 10, 2001 



DOCUMENT- IDENTIFIER: US 6260115 Bl 

TITLE: Sequential detection and prestaging methods for a disk storage subsystem 
Detailed Description Text (34) : 

FIG. 10 is a flow diagram of a locking process applied to each list 100. This 
process prevents conflicts between two or more threads of execution, for example in 
a multiprocessing environment, attempting to access the same logical storage device 
data structure simultaneously. Before making any changes to the list 100 for a given 
logical storage device, the control unit program must check the status of the lock, 
as shown by decision block 1000. If the list 100 is locked then the control unit 
software program must either wait for the list 100 to become unlocked, or abandon 
the attempt to change the list. When the control unit software program finds the 
list 100 unlocked, it locks the list 100, as shown in block 1002, to prevent 
interference from another thread of execution of the control unit software program. 
Next, the locking execution thread may change data within the list 100, as shown in 
block 1004. When all of the changes are finished, the list 100 is unlocked, as shown 
in block 1006, so that another thread of execution in the control unit software 
program can update the list 100. 

Detailed Description Text (35) : 

A variety of locking methods may be used to implement the locking process. For 
example, a spin -lock word (not shown) could be defined in the control word 104 for 
each list 100. The spin -lock word allows only one competing thread of execution to 
access to the list 100, and it prevents that thread of execution from keeping list 
100 locked for an indefinite time. Alternatively, a range lock (not shown) may be 
created to lock and unlock a range of memory addresses. The preferred approach is to 
use the update in progress bit 106 in the control word 104 to indicate whether the 
list 100 is currently locked, as shown in FIG. 1. The update in progress bit 106 
approach is simple, requires few processor cycles to lock and unlock, and consumes 
minimal memory. 
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Nov 20, 1984 



DOCUMENT- IDENTIFIER: US 4484270 A 

TITLE: Centralized hardware control of multisystem access to shared and non-shared 
subsystems 

Detailed Description Text (469) : 

The SAU Reserve command of FIG. 34 locks the SAU onto the SSP interface that 
received the command . Other SSPs are denied access to the SAU until the reserving 
SSP releases the interface or until an SAU Reset command is received from any SSP 
interface. A busy indication is returned for all commands (except SAU RESET) from 
other SSPs. 

Detailed Description Text (471) : 

The SAU Release command of FIG. 35 unlocks the SAU from an SSP interface. The 
release command must be received from the same SSP that caused the reserve 
condition. A reserve condition results from receipt of a Reserve command or the 
sending of Unit Check Status. 



Detailed Description Text (578) : 

The expected response from an SSP is first a reserve command which locks the SAU 
onto the SSP interface that received the command and second, a command to load 
control store. The set of commands accepted by the SAU following its powering on is 
limited to these two plus two more: a reset command which causes the SAU to reset 
its registers and tables, and a release command which countermands the reserve 
command. 



Detailed Description Paragraph Table (8) : 

TABLE 6 

6 7 



SAU COMMANDS Code Command 0 12 3 4 5 

Add Subsystem 00000101 05. sub. 16 \ 

1 15. sub. 16 Write Subsystem IOP Number 0 0 0 0 10 0 
00001010 OA. sub. 16 Write IOP State 
1 31. sub. 16 Read IOP State 00110 010 32. sub. 16 Write SSP Number 
1 21. sub. 16 Read SSP Number 00100010 2 2. sub. 16 Write Control 
0 0 0 1 81. sub. 16 Read Control Store 10000010 82. sub. 16 SAU 
10 0 11 13. sub. 16 SAU Release 00100011 23. sub. 16 SAU Reset 1 
FF.sub.16 Read ID Word O00001110 OE.sub.16 Read ID Word 10 0 



Remove Subsystem 0 0 0 10 10 
1 09. sub. 16 Read Subsystem Interface Table 
0 0 1 1 0 0 0 
0 0 1 0 0 0 0 
Store 10 0 0 
Reserve 0 0 0 
1111111 

0 10 0 10 12. sub. 16 Read SPI 00101010 2A.sub.16 Read BCTS Interface 0 0 11 
10 10 3A.sub.16 Set Test Mode 00010111 17. sub. 16 Clear Test Mode 0 0 10 0 
111 27. sub. 16 Set SAU Lock 01100111 67. sub. 16 Clear SAU Lock 1000011 

1 87. sub. 16 Write SSP History 10010001 91. sub. 16 Read SSP History 10 0 10 0 
1 0 92. sub. 16 
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File: USPT 



May 8, 2001 



DOCUMENT- IDENTIFIER: US 6230190 Bl 

TITLE: Shared-everything file storage for clustered system 



Detailed Description Text (7) : 

The file sharing protocol includes the Common Internet File System (CIFS) for 
Microsoft-based systems or the Network File System (NFS) for Unix-based systems. 
Alternatively, the file sharing protocols may be the Server Message Block (SMB) 
protocol, which is used over the Internet on top of its TCP/IP protocol or on top of 
other network protocols such as IPX or NetBEUI. The file sharing protocol supported 
by the RAID data storage device 106 or 108 provides a locking facility which may be 
a fil e locking facility or a byte-range locking facility. The locking facility 
enhances data integrity for the file sharing environment of FIG. 1. Locking can be 
used to coordinate concurrent access to a file by multiple applications and users. 
It can prevent concurrent readers and writers of shared data from reading "stale" 
data (i.e., data currently in the process of being updated by another application) 
and/or overwriting each others' updates . 
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File: USPT 



May 14, 2002 



DOCUMENT- IDENTIFIER: US 6389511 Bl 

TITLE: On-line data verification and repair in redundant storage system 



Detailed Description Text (72) : 

The general method of verifying and repairing data in a parity redundancy group is 
analogous to the above method. For example, a primary disk adapter can be selected 
for examination of a redundancy group on a track-by-track basis, e.g., locking an 
individual track, writing the entire track from each applicable physical storage 
device to a cache, and then examining all of the data/parity information for that 
track before examining the next track. As for the above embodiment, of course, the 
advantages of the invention may be achieved using other design parameters. 

Detailed Description Text (84) : 

There are two possibilities once the existence of a data coherence problem has been 
identified. One of the data units may not have been updated, even though the parity 
information corresponding to that update has been made. In the alternative, one of 
the data units may have been updated, but the parity unit not updated. Put another 
way, the writing of data in a parity redundancy group involves writing (1) the new 
data; and (2) updating parity to correspond to the new data. For example, if data on 
the Dl track is updated during normal operation, this would result in new data being 
written to the Dl track and also to the P track. If there is a data coherence 
problem, it arises from writing either Dl or P, but failing to write the other. In 
this case, Dl has new/updated data and Dl-new has the old (not updated) version of 
Dl, or vice versa. (The first corresponds to failure to update parity, while the 
second corresponds to a failure to update Dl when parity was updated.) 

Detailed Description Text (87) : 

At a step 122a, the Dl-new track is examined to determine if it is a viable track 
(i.e., is internally consistent with respect to its format and other parameters). If 
Dl-new is not viable, then the coherence problem must have arisen from failure to 
update data on a different track (or the parity track when that data was written) 
because Dl-new is "garbage." 

Detailed Description Text (91) : 

In the alternative, if Dl bears the newer time stamp, the data coherence problem 
arose from a failure to update parity when Dl was updated. Accordingly, at a step 
124, P is determined invalid and the repair process may proceed. 

Detailed Description Text (94) : 

In the embodiment of FIG. 11, it is assumed at step 122c that the correctness of the 
format in Dl-new is sufficient to identify the exact data coherence problem- -failure 
to update Dl although the P unit was updated, or vice versa. If the nature of the 
formatting is insufficient to draw this conclusion at this point, then other data 
tracks may still be examined, by continuing at step 123. If more than one possible 
data coherence problem is identified, a complete log of the possible data coherence 
problems can be generated for a further examination of what data coherence problem 
or problems may exist on this track. This situation may arise, for example, in the 
event that a write to one data track (e.g., D2) but not parity does not result in a 
format violation when another track (e.g., Dl-new) is generated from parity and the 
one data track (e.g., D2 XOR P) . Dl-new will have a correct format but the data 
coherence problem actually exists elsewhere. 



Detailed Description Text (95) 
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Returning to step 122a, if the Dl-new format has been determined not to be viable, 
then the Dl-new unit is simply garbage data. The data coherence problem has not yet 
been identified- -the problem is either a failure to update parity or data on a 
different disk in the redundancy group. Processing continues, therefore, at a step 
123 . 



